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Abstract — Wireless sensor networks (WSNs) need to support real time periodic or sporadic queries of physical 
environments. In this work, we focus on the periodic queries. For each periodic query issued by control applications 
in a WSN, the data from the source sensors should be collected and/or aggregated to the control center within a certain 
end-to-end delay. We first propose almost-tight necessary conditions for a set of queries to be schedulable by a WSN. 
We then develop a family of efficient and effective data collection/aggregation algorithms that can meet the real-time 
requirement for quality of service (QoS) under resource constraints by addressing three tightly coupled tasks: (1 ) routing 
tree construction for data aggregation/collection, (2) link activity scheduling, and (3) packet scheduling at nodes. Our 
theoretical analysis for the schedulability of these algorithms show that they can achieve a constant fraction of the 
maximum schedulable load. For the case of overloaded networks where not all queries can be possibly satisfied, we 
propose an efficient algorithm for query selection that approximately maximizes the total weight of selected schedulable 
queries. Extensive simulations validating the proposed algorithms corroborate our theoretical analysis. 
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1 Introduction 

Recent years have seen the emergence of wireless 
sensor networks (WSNs) where different types of 
sensors are deployed to monitor various aspects of 
the environment, such as monitoring light, temper- 
ature, acoustic and ammonia and so on. WSNs are 
also being deployed in a wide variety of other ap- 
plications. For WSN applications, the data collected 
by the sensors are often streamed to a control center 
(called sink). A typical implementation will collect 
raw data from sensors and possibly perform in- 
network aggregation on data stream to relieve some 
communication burden (power and bandwidth) on 
the network. For most control applications, the 
semantics and the importance of data depend on 
the time when data are utilized. Thus, the observed 
events and consequently the data from the source 
sensors must be collected or aggregated at the con- 
trol center within a certain delay. A key challenge 
then in WSNs is to meet the end-to-end delay 
requirement of control applications under wireless 
interferences and the severely limited resource con- 
straints of WSNs. 

A number of protocols have been proposed in the 
literature for data collection in WSNs that balance 



the communication cost, delay, and reliability l|T6ll . 
However, not much effort has been paid into the 
design of aggregation schemes that provide end-to- 
end performance guarantees for periodic queries. In 
this paper, we concentrate on designing effective 
scheduling of activities of nodes to satisfy multiple 
heterogeneous queries. Given are a set of sensor 
nodes and a distinguished sink node. The sink node 
issues a set of periodic queries, each has a period, 
initial release time and relative deadline require- 
ment for receiving the answer back. The sink node 
expects to receive the corresponding (possibility 
aggregated) data from all sensor nodes in time. 
Given any interference model (note that we are 
not restricting ourself to one specific interference 
model), the objective is to jointly design a rout- 
ing tree for each query, and an interference-aware 
schedule of activities for all nodes (i.e., when to 
transmit and what packets to transmit) such that the 
deadlines of queries are met. 

The problem of sporadic query scheduling in 
the network for data aggregation/collection under 
various interference models has been extensively 
studied recently 0, & HID, G2i E3, GOl - 
E3H . ||34ll , 051 . 11391 . At the same time, numerous 
significant results exist for various scheduling re- 
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lated problems such as scheduling periodical jobs 
at a single processor IfTTI , lfT8ll , IflTTl and packet- 
level scheduling for the Internet 11321 , Il42ll . Surpris- 
ingly, only a few approaches 0, lfl4l . Il3~3ll have 
studied the "real-time" data aggregation/collection 
scheduling in multi-hop WSNs. Unfortunately, al- 
though performing reasonably well in simulations 
for WSNs, these methods do not provide a theoret- 
ical performance assurance. The major hurdle may 
be due to the absence of flow conservation when 
in-network data aggregation is performed. 

Compared with prior work, our main contribu- 
tions deal with the schedulability test and effective 
scheduling algorithms. In summary, our main con- 
tributions are as follows: 

A Necessary Condition for Schedulable Queries 

We propose a necessary condition for a set 
of queries to be schedulable: Theorem [2] 
summarizes a necessary condition for 
data aggregation queries and Theorem [6] 
summarizes a necessary condition for data 
collection queries under various interference 
models. To address this challenge, we propose 
several novel concepts such as initial load of a 
node, initial load of a region, and relay load. 
A Sufficient Condition for Schedulable Queries 
We design efficient algorithms for constructing 
a routing tree for each of queries, scheduling 
node activities for each wireless node, 
and packet scheduling. We theoretically 
prove that the schedulable queries by our 
methods achieve a load that is within a 
constant factor of the maximum schedulable 
load. Based on the proposed algorithms, in 
Theorem [3j we present a sufficient condition 
for schedulability of data aggregation queries 
and in Theorem [71 we present a sufficient 
condition for schedulability of data collection 
queries under various interference models in 
WSNs. 

Overloaded Network When the load of all queries 
exceed the network capacity (i.e., the WSN is 
overloaded with queries from control applica- 
tions), we propose an efficient query- selection 
algorithm by carefully selecting a subset of 
queries such that the total weight of selected 
queries (that are schedulable by our algorithms) 
is at least a constant fraction of the optimum 
solution. 

Simulation Results We conduct extensive simu- 
lations to validate proposed algorithms. Our 



simulation results in TinyOS corroborate our 

theoretical analysis. 
The rest of the paper is organized as follows. 
Section [2] presents the system model. Section @] 
and Section [5] respectively present schedulability 
results on data aggregation and collection queries 
under various interference models. Section [3] studies 
the query scheduling in overloaded networks. We 
present our simulation results in Section [6l review 
the related results in Section |7J and conclude the 
paper in Section [8j 

2 System models 

2.1 Network Model 

Consider a WSN as a graph G = (V, E), consisting 
of a set V of n sensor nodes where v s E V is 
the sink node, and E is the set of communica- 
tion links. Two nodes can communicate with each 
other if they are within each others' transmission 
range. To let two links transmit simultaneously, 
we must ensure they are interference free. In this 
work, we extensively study the schedulability of 
queries in a WSN under several commonly used 
interference models, such as Protocol Interference 
Model (PrIM), RTS/CTS Model, and Physical Inter- 
ference Model (PhIM) or the Signal-to-Interference- 
plus-Noise Ratio model (SINR model). In PrIM 
[fTTTl . each node Vi, in addition to have a uniform 
transmission range (scaling to 1), has an interference 
range p such that any node Vj will be interfered by 
the signal from if \\vi — Vj\\ < p and node Vj is 
not the intended receiver of the transmission from 
Vi. In the RTS/CTS model [1], for every pair of 
transmitter and receiver, all nodes that are within 
the interference range of either the transmitter or 
the receiver cannot transmit. In PhIM [9], there is 
a threshold value (3 > 0, such that a node vj can 
correctly receive the data from a sender Vi if and 
only if the signal to interference plus noise ratio at 
the receiver satisfies 

SINRKtfrH >/?. 

Here dkj is the Euclidean distance ||i>fc— Vj\\, N > 
is the background noise, Pj is the transmission 
power of node i (we assume the transmission power 
is a constant, i.e., Pi = P), while I is the set 
of actively transmitting nodes when node Vi is 
transmitting, and k > 2 is the path loss exponent. 
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2.2 Query Model 

Assume the control application issues a set of het- 
erogenous queries, and all source nodes generate 
data reports periodically at specified data rates. In 
practice, queries could be different in many aspects. 
The i-th query can be characterized as follows: let 
Si C V denote a subset of source nodes each of 
which needs to answer this query. We assume that 
each source node v G Si will generate a data unit to 
be collected to the sink v s periodically. We assume 
that it takes x-i time to transmit a data unit for the i- 
th query over any link in the network. Here x% could 
be different for different queries. For simplicity, we 
assume that x% already takes into account the link 
reliability, data preparing time at nodes, and data 
size variety for answering queries. 

The i-th query will be initially released at time 
&i and will have an end-to-end delay requirement 
dj for getting the answer. In other words, the sink 
should receive the answer before time fj = + dj. 
We assume that the i-th query has a period p^; then, 
the t-th instance of this query will be released at 
time &i + (t — 1) • Pj and the deadline for getting the 
answer is f- = a.; + (t — 1) • p^ + d;. 

In this work, we focus on two types of queries: 
data aggregation and data collection. Data aggrega- 
tion allows in-network fusion of data packets from 
different sensors enroute to the sink via multi-hop 
paths. For aggregating data, we assume that the data 
generated by a sensor node at a time t have a time- 
stamp t; and we can only aggregate the data packets 
that have the same or similar time-stamps from a 
time period, depending on the control application. 
For simplicity, we assume that an intermediate node 
can aggregate multiple incoming packets into a 
single outgoing packet of the same size. The data 
collection query is to collect the raw data from every 
sensor node to the sink without any in-network 
processing. 

Two different questions will be answered in this 
work. First, given a set of c queries Q for data 
aggregation, each with its own period p i5 processing 
time Xh end-to-end deadline f;, and a set of sources 
nodes Si C V, whether the set of queries can be 
satisfied, and if so, design effective routing and 
scheduling algorithms to meet the specified require- 
ments. The same question will be answered if we 
are given a set of queries for data collection instead. 
The second type of questions is to design routing 



and scheduling protocols that will maximize the 
total weight of scheduled queries when we cannot 
schedule all queries successfully and each query is 
associated with a positive weight. 

3 Drop Overloaded Queries 

In this section, we study scheduling for an over- 
loaded sensor network when not all arriving queries 
can be scheduled. Let us focus on the data collection 
queries: given a set of data collection queries Q, 
assume the z-th query is associated with a weight 
Wj. The objective is to select and schedule a subset 
of queries S C Q to maximize the overall weight 
of the scheduled queries. 

We reduce our problem to a 0-1 knapsack prob- 
lem as follows: given c items, the i-th query can 
be considered as an item of size ' Si '' Xi and weight 
Wj. The objective is to select a subset of items with 
total size at most C such that the weighted sum of 
all selected items is maximized. Here C is called the 
bag size. We will denote the 0-1 knapsack problem 
with bag size C by KS(C) for brevity. Then, our 
algorithm consists of two phases: 

Phase I: we enumerate each single query whose 
load \ Si j' Xi is no larger than 1 and select the one with 
the maximum weight as the first candidate solution; 

Phase II: we use the solution for KS( / ,?; 69 ,.^ ) 

v C2(M)-C4,(M) > 

as the second candidate solution. 

The final solution can be obtained by choosing 
the one with larger weight among these two can- 
didate solutions. Please refer to Algorithm Q] for 
details. Note that we can design a joint routing 
and scheduling protocol to satisfy a set of data 
collection queries Q under an interference model 
M, if Y < ^' 69 nnn - Therefore, it is easy 

^ l Pi — C2(M)-C4,(M) ' J 

to verify the correctness of our solution. 

The challenge here is to derive an approximation 
bound on this solution. Recall that for any set of 
schedulable queries, we must have ^ ^'J' x ' < 1, 
which implies that the optimal solution for our 
problem is no larger than the optimal solution of 
KS(1). Let OPT KS (i) denote the optimal solution of 
KS(1). The following lemma shows that the selected 
queries have weight at least a constant fraction of 
the weight of OPT KS(1) . 

Lemma 1: Let w(A) denote the weight of the 
queries selected by Algorithm [fl and d = 
C2i M)Z iM r we have I • w(OPT KS(1) ) < w(A). 
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Algorithm 1: Maximum Weighted Query Se- 
lection 

l: A {1] : = {arg max \s i hc i< 1 {w < »; 

A ' Pi ~ 

2: A [2] : = 

the solution returned by KS(^^^y); 
3: A := argmax Ae{yl[1]>A[2]} {w(y4)}; 



Together with the fact that the optimum solution 
of our problem is no larger than w(OPT KS m), 
Theorem Q] immediately follows. 

Theorem 1: Algorithm Q] is d/2-approximation 
for the maximum weighted query selection problem, 
where d = -, . ,°; 69 , . AS . 

In the previous discussions, we assumed that we 
will drop some queries when we cannot answer all 
queries in time. In practice, it may be possible to 
partially satisfy all queries, by carefully dropping 
some packets from some query flows (once every 
certain period), or dropping some packets from 
some data-source nodes. Dropping packets (tempo- 
rally or spatially) is feasible for some applications 
because of the possible (temporal and/or spatial) 
correlation among data sensed by different sensors. 
Our algorithm can also be extended to deal with this 
case and details are omitted due to space limitations. 

4 Real-time Schedule for Aggre- 
gations 

This section deals with scheduling for data aggrega- 
tion queries. We first propose both necessary con- 
ditions and sufficient conditions for schedulability 
of a given set of data aggregation queries. We then 
develop efficient routing protocols, link scheduling, 
and packet scheduling methods to satisfy a schedu- 
lable set of queries. 

4.1 Necessary Conditions for Schedulabil- 
ity 

Our study of necessary conditions and later suffi- 
cient conditions for schedulability rely on several 
novel concepts, namely initial load and relay load 
of a node (and/or a region). Let us first introduce the 
concept of initial load. Given a WSN G = (V, E) 
and a set of queries Q, the initial load of a node 
u E V is defined as ^g,q{ u ) — Ylues IT' wnere 
Xj is the processing time, is the period, and 



Sj C V is the set of source nodes in G for the j- 
th query. If we describe region as any continuous 
area in a two-dimensional plane, the initial load 
of a region g is defined as the summation of the 
initial loads of all nodes in this region g, i.e. 
^G,Q(9v,h) = 2~2 u ev( g ) £ g,q(u) where V(g) consists 
of all nodes from V lying in the region g. 

In this section, we will focus on the initial 
load of a special region (called interference-aware 
region) which is a square in a two-dimensional 
plane, with the interference-aware radius as its 
side-length. Given an interference model Ai, the 
interference-aware radius X(Ai) is the maximum 
possible distance between two senders such that 
the corresponding two links will interfere with each 
other under M.. This means that two nodes can 
transmit concurrently without the interference of 
other links in the network if the distance between 
them is greater than \{M). We can compute \(M) 
based on the parameters of the model Ai, We then 
partition the two-dimensional plane by using a set 
of vertical lines a» : x = i ■ X(Ai) where i E Z 
and horizontal lines bj : y = j ■ X(Ai) where 
i E Z. Here Z represents the set of all integers 
and i,j E Z is called the index of vertical line a v 
and horizontal line b^. Clearly, each square formed 
by a pair of neighboring vertical lines aj,a i+ i and 
a pair of neighboring horizontal lines bj,bj + \ is an 
interference-aware region denoted as g^. 

Observe that to schedule the nodes' transmis- 
sions, for a clique in which no two nodes can 
transmit concurrently, the summation of all nodes' 
initial loads in the clique can not exceed one. Gen- 
erally, for any interference-aware region where the 
maximum number of nodes in that region that can 
transmit concurrently is a constant ci(Ai), hence 
the initial load of this region is at most c\{M). 

On the other hand, for a data aggregation query 
(assume the j-th query), the sink node v s E V needs 
to receive at least £j amount of data during every 
period p j5 which takes time Xj f° r aggregation. 
Obviously, for a set of queries Q, the load at sink v s , 
given by V . — , is at most one if Q can be answered 
in a delay specified by the queries and the network. 

To sum up, we propose in Theorem [2] a necessary 
condition for a set of queries Q to be schedulable, 
where a set of queries is schedulable iff they can be 
answered in time. 

Theorem 2: If a set of data aggregation queries 
Q is schedulable under an interference model Ai, 



5 



then the following conditions must be satisfied: 

Here (G,Q(9v,h) is the initial load of an interference- 
aware region g Vjh . Constant ci(M.) > 1 is the 
maximum number of nodes that can transmit con- 
currently in any interference-aware region under the 
interference model Ai. 

Henceforth all the proofs will be deferred to the 
appendix for your fluent reading of the main part 
before falling into the technique details. 

Next, we derive the value of ci(A4) under various 
interference models. Note that for physical inter- 
ference model, the interference-aware radius \(Ai) 
is the s ame as the maximum transmission radius 
r = The maximum transmission radius r 

can be perceived as a threshold for communication 
distances: a pair of nodes can possibly communicate 
and thus be connected iff their mutual distance is 
smaller than the threshold r. In other words, a node 
u cannot transmit data to another node v which is 
more than r distance away even in the absence of 
other concurrent transmissions. 

Lemma 2: The constant ci(Ai) in Theorem [2] is 
given as: 

( under PrIM 

cx{M) = I 36 under RTS/CTS 
L&l under PhIM 

4.2 Efficient Algorithms for Scheduling 
Queries 

Given a set of data aggregation queries subject 
to wireless interference constraints, we will design 
effective and efficient algorithms to answer it. Since 
our approach is based on the concept of connected 
dominating set (CDS) in a graph, let us define it 
first. 

In a graph G = (V,E), a subset V C V is a 
dominating set (DS) if each node in V is either in 
V or adjacent to some node in V . Nodes in V 
are called dominators, whereas nodes not in Vq are 
called dominatees. A subset C C V is a connected 
dominating set (CDS), if C is a dominating set and 
C induces a connected subgraph in G. 

4.2. 1 General framework 

Our framework for scheduling data aggregation 
queries under various interference models consists 
of several phases: 



Phase I: For each query, construct a routing tree 
for data aggregation. 

By using the routing tree Tj for the i-th query, 
all data in the source nodes Si for this query can be 
routed to the sink v s . Note that any routing tree Tj 
is a special Steiner tree connecting the terminals of 
Si U {v s } where node v s E V is the sink and <Sj is 
the set of source nodes. Since the routing tree will 
be used for data collection (Section I5.2I) as well, we 
call it as the data gathering routing tree. 

By using the data gathering routing tree, we will 
specify the data units to transmit in real time for 
each node. For the j-th query with a routing tree 
Tj, during each period, first every leaf node in Tj 
adds data to transmit, then every non-leaf node in 
Tj generates one data unit and adds to its buffer 
upon receiving all data units from its children in Tj 
for this period of the j-th query. Note that node u 
may receive data from multiple children, however, 
it only generates one unit of data by aggregating all 
received packets for the period with its own data, if 
any. 

Phase III: Assign time for every node to transmit. 

We will propose a linear time assignment scheme 
in which each node in a region is assigned with 
transmission time proportional to its relay load, 
defined as follows: Given a set of queries Q and 
the corresponding data aggregation routing trees, we 
define the relay load of a node u as £ G q(u) = 
St gu ~- We then define the relay load of a region 
g as the summation of all nodes' relay loads in 
this region: C 0tQ (g V)h ) = E» e y( 9 ) £g,q{u), where 
y(g) Q V is the set of all nodes from V lying in 
region g. The relay load contains both the initial 
load and the data load coming from routing data. 
Thus the relay load of a node can be perceived as 
the fraction of time for a node to be actively trans- 
mitting data using the routing structure. Therefore, 
a node with more packets (thus more relay load) 
will be assigned with more time to transmit. 

Phase IV: Select data packets to transmit for each 
node. 

We will use the rate monotonic method in this 
phase. Note that rate monotonic method is effective 
to ensure that every packet can catch its deadline 
for periodic jobs lfT8i 

Let us now describe our methods for each phase 
separately. 
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(a) (b) 



Fig. 1. Region Coloring: (a) for protocol in- 
terference model and RTS/CTS model; (b) for 
physical interference model. 



query. In either case, 11 G T r Thus, we can test 
u G Tj to determine whether a node u is involved 
in the i-th query or not. 

If a node u is involved in the j-th query, during 
each period p.^, that node needs to add a packet 
for this query to its transmission plan. The added 
packet is either original packet or an aggregated 
one for the received data from all its children in 
the corresponding data gathering routing tree T,,-. 
For each node, we store the transmission plan to its 
buffer. 



4.2.2 Constructing a data gathering routing 
tree 

The constructions of routing trees are similar under 
various interference models. Given a communica- 
tion graph G = (V,E), we select a CDS Tcds 
of G by using an existing approach ll2~6ll . We then 
construct a spanning tree Tg by connecting each 
node not in the CDS to a neighboring dominator. 
For the 2-th query, we prune ^ every node u E V 

and the corresponding link up(u) (the link from u to 
its parent p(u)) in T G if the intersection between Si 
and the subtree of Tq rooted at u (noted as T^) is 
empty: Si D Tq = 0. The pruning operations result 
in a routing tree Tj for the i-th query. 

For PhIM, if the length of a communication link 
is very close to r, then the SINR at the link's 
receiver is barely above the threshold, and the link's 
transmission will fail with high probability (w.h.p). 
Therefore, a link whose length is very close to r is 
not a good candidate for transmission in practice. 
Given a parameter 5, if we connect every pair of 
nodes with distance at most Sr in the network, we 
can derive a reduced communication graph, denoted 
as G(V,5r), a subgraph of the original communi- 
cation graph G(V,r). If G(V,5r) is connected, we 
can perform data transmissions in this subgraph. 
Therefore, under PhIM, we construct data gathering 
routing trees in a reduced communication graph 
G(V, 5r) instead of a data gathering routing tree in 
G(V,r). 

After we construct a data gathering routing tree 
for each query, the second phase is to construct 
a real-time transmission plan for each node. We 
first need to determine which queries the node is 
involved with. Observe that a node u is involved in 
the j-th query if: (1) u is a source node for this 
query, i.e., u G Si, or (2) u is a relay node for this 



4.2.3 Interference-aware node scheduling 

After computing the transmission plans, each node 
may store some packets in its transmission plan. 
The third phase is to schedule (or assign) concrete 
time to each node for transmission, and to ensure 
interference-freeness at the same time. The pro- 
posed scheduling consists of two steps: (1) deter- 
mine which region to select nodes from, called an 
active region; (2) determine which node from an 
active region to transmit. 

First, we color all interference-aware regions such 
that any pair of neighboring regions with the same 
color are separated by K(A4) — 1 regions, where 
K(A4) is a constant depending on the interference 
model. Clearly, the chromatic number for this col- 
oring method is c 2 (M) = K(M) 2 . Specifically, 
c 2 (A / f) = 4 under the protocol interference model, 
the RTS/CTS model, and c 2 (Ai) is some other 
constant in the physical interference model (see 
Fig. [T]). With the help of region coloring, we ensure 
that if only one node is selected from each of the 
interference-aware regions with the same color to 
transmit, we can ensure interference-freeness under 
the given interference model, irrespective of the 
positions of the receivers. Using this property, each 
time we let all regions with the same color be active 
to ensure interference-freeness . 

Second, we assign transmission time to nodes 
in an active region. Clearly, a node with more 
relay load needs to be assigned with more time. 
We propose a linear time assignment scheme in 
which each node in an active region is assigned 
with transmission time proportional to its relay load. 
Given a time duration T (here T > p , Vj) such that 
an interference-aware region g is active, we need to 
assign each node u in region g with transmission 
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The details are shown in Algorithm |2] which is 
performed for every c 2 (M) ■ T time duration. Then 
each region is active for exactly T time duration. 
When a region is active, we apply linear time 
assignment to each node in this region. Note that, 
the assignment does not require global coordination 
between different regions. Thus it can be imple- 
mented in a distributed manner efficiently. 



Algorithm 2: Interference-aware node schedul- 
ing 

Input : Routing trees for all queries 

1 K(M)<r- \yfa(M)l, 

2 for each interference-aware region g vh where 
v,h eZ and g Vt h contains nodes do 

3 Assign the region with color: 

4 (v mod K(M)) ■ K(M) + h 
_ mod K{M); 

s for i = 1, • - ■ , K(M) and j = !,■■■ , K(M) 
do 

6 for each region g V)h of the i ■ K(M) + j-th 
color where v, h £ Z, and g Vj h contains 
nodes do 

7 for each node u in region g v h do 

8 assign the node with transmission 

time- T ■ £g ' q(m) • 
_ £G,o.(gv,h)'' 

9 return a set of transmission time for each 
node. 



4.2.4 Packet scheduling 

When it is a node's transmission time, our fourth 
phase is to select packet(s) from the node's trans- 
mission plan to transmit. 

We use a rate monotonic lfT8l . [|29l method to 
select packets from the node's transmission plan: 

1) All packets of current period have lower prior- 
ities than that of all previous periods. 

2) The priorities of all packets of any queries are 
assigned on a rate-monotonic basis. In other 
words, a packet of current instance for a query 
with a shorter period has a higher priority over 
the packet of current instance for a query with 
a longer period (at absolute time t, a packet is 
at current instance if it is produced during a 
time period containing t). 

Similarly, a packet of previous instance for 
a query with a shorter period has a higher 



priority over a packet of previous instance for a 
query with a longer period. Ties are broken by 
lexicographic order (current/previous, p ; , ID). 
3) All packets of previous instances for the same 
query are scheduled on the first-in-first-out 
basis. 

As proved in |fT8l , the rate monotonic method 
can achieve optimum performance for each packet 
to be transmitted before deadline, if each node has 
utilization (the utilization can be seen as the ratio 
of relay load to the fraction of time it is assigned 
to) of at most n ■ (2 1 /™ — 1) where n is the number 
of queries the node is involved. Note that n < c. 
For large n, we obtain the utilization bound of 69% 
means that as long as each node has utilization of 
less than 69%, all packets can make their deadlines. 

4.3 Sufficient Conditions for Schedulability 

In Section I4.2L we proposed a family of algorithms 
to schedule periodic data aggregation queries. We 
prove that our algorithms are feasible. Here an 
algorithm is feasible for a given set of queries iff 
by using the algorithm, we can ensure interference- 
freeness as well as answer all queries within the 
deadline. 

Lemma 3: The proposed algorithms in Sec- 
tion [472] can answer a set of data aggregation queries 
without interferences, if 

jC GtQ {g v , h ) < Om/c 2 {M), Vg v , h 
IE,- <0-69 () 

Where C G) Q{g Vjh ) is the relay load of an 
interference-aware region g vh , and c 2 (M.) is the 
chromatic number for region coloring such that 
if we only select one node from each of the 
interference-aware regions with the same color to 
transmit, we can ensure interference-free under the 
interference model M.. 

Lemma 4: The proposed algorithms in Sec- 
tion 14.21 can answer all queries within the deadline, 
if d; > c 2 {M) ■ T ■ 2R : Vz. Here d; is the delay 
requirement for the i-th query, c 2 (M.) is given in 
Lemma [51 and R is the radius of communication 
graph G. 

Observe that smaller T means that we could 
satisfy more queries with tighter deadlines. On the 
other hand, observe that, in Algorithm [2l each 
node will be assigned time T ■ Ca <^ u '> to trans- 
mit packets in its transmission plan (or buffer). 
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This time should be at least sufficient to transmit 
one packet. In other words, T should be bounded 
from below by a value. When the time to trans- 
mit a packet is treated as 1 unit, we have T > 

[max,^ max uGgvh 1'g.q(u) 1 • 

The feasibility verification (Lemma [3] and H]) 
implies a sufficient condition for schedulability of a 
given set of queries: 

Theorem 3: Equation © is a sufficient condi- 
tion for schedulability of a set of data aggregation 
queries. 

Next, we derive C2(.M)'s value under various inter- 
ference models. 

Lemma 5: The value c 2 (M) used in Lemma |3] 

is 

(4 under PrIM & RTS/CTS 
' ~ 1 O(l) under PhIM 

4.4 Main Theorem 

In this section, we study the difference between the 
necessary condition (Theorem O and sufficient con- 
dition (Theorem [3]) for schedulability of a set of data 
aggregation queries. We addresses the questions for 
two cases: (1) queries on all nodes, i.e., Si = V for 
the 2-th query, and (2) queries on a subset of nodes, 
i.e., Si C V for the i-th query. 

4.4. 1 Queries on All Nodes 

For queries on all nodes, every node needs to report 
a packet in every period of the j-th query. Then 
the initial load of each node is V — . On the 
other hand, for data aggregation, each node only 
needs to transmit one packet during a period of 
the j-th query. Thus the relay load of a node is 
at most ^2 j y no matter what routing tree is used. 
Since every node's initial load can not exceed its 
relay load, they are the same for each node. As 
a corollary, for each interference-aware region, the 
relay load is the same as the initial load. Using this 
property, we can easily prove Theorem 0] below. 

Theorem 4: When all nodes have data, a con- 
stant approximation ratio of ci(.M)c2(.M)/0.69 can 
be achieved for schedulability of a set of data 
aggregation queries Q under various interference 
models M.. 

4.4.2 Queries on Subset of Nodes 

For queries on a subset of nodes, different routing 
structures (a set of data gathering routing trees) 



for the given set of queries will have vast impact 
on the relay load of a region, even if the region's 
initial load is fixed. In an extreme case, the relay 
load may be large while the initial load is zero (see 
Figure [2]). However, we can compare the difference 
between the necessary condition (Theorem [2]) and 
sufficient condition (Theorem [3]) for schedulability 
from another perspective. 

Lemma 6: Given an interference model M. and a 
set of data aggregation queries Q, using our routing 
algorithms in Section |4~2l the following condition 

fta,Q(9v,h) < 0.69/ (2 ■ c 2 (M)) , Vg v , h 

)V S < 0.69 w) 

[Z^i Pj ^ 2-c 2 {M)-(c i {M)-l) 

Here c 4 (.M) is the maximum size of CDS inside an 
interference- aware region plus one. 

Corollary 1: Given a set of data aggregation 
queries Q under an interference model Ai, Equa- 
tion © is a sufficient condition for schedulability. 

Theorem 5: When a subset of nodes have data 
for each query, we can achieve an approximation ra- 
tio of m&x{2 Cl (M)c 2 (M)/0.69, ^MM)-MM)-i) y 
for schedulability of a set of data aggregation 
queries Q under various interference models. 

We derive the value of c^Ai) under various 
interference models. 

Lemma 7: The constant 

{8 • (p + 4) 2 under PrIM 
200 under RTS/CTS 

200 under PhIM 

We next show by example that the sufficient con- 
dition given in Corollary \T\ is (approximately) tight. 
Indeed, the approximation ratio is independent of 

c 4 (M). 

In Figure [2l node v s E V is the sink. There 
are \fk vertical evenly spaced lines with distance 




Fig. 2. An example for node placement in an 
interference-region. 
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dh = (1 + e) between consecutive lines (e.g., the 
distance between u and v is 1 + e). Additionally, 
y/k— 1 nodes, like w between u and v, act as bridges 
to keep the network connectivity. Clearly, there are 
k = 0(p 2 ) nodes deployed in the interference-aware 
region, and the size of CDS in this region is k — 1, 
hence c 4 (.M) = k. The residual network (all nodes 
outside of the region) is connected to sink v s only 
through the red path from node t to v s . We assume 
all sources nodes (Sj for the j-th query) are located 
in the residual network. 

To aggregate data to the sink v s , we should 
strictly follow the red path as shown in Figure[2l It is 
easy to verify that the relay load of the interference- 
aware region is c 4 (M) ■ £\ — (the initial load is 
zero). Thus a necessary condition for schedulability 
for the example network is £\ ^ < We can 

verify that the sufficient condition in Corollary Q] 
nearly match this necessary condition by a factor of 
at most C\(M) ■ c 2 (M.) which is again independent 

of a(M). 

5 Real-time Schedule for Col- 
lections 

5.1 Necessary Conditions for Schedulabil- 
ity 

Given a WSN with the communication graph G = 
(V,E) and a set of data collection queries Q, 
assume the j-th query is associated with its time Xj 
needed by transmitting over a communication link, 
its period p., its deadline dj, and a set of source 
nodes Sj E S in G that contains the initial raw data. 
For the i-th query, no matter what data collection 
routing tree is used, the sink node needs to receive 
all the raw data from Sj. Thus, the initial load of 
sink node coming from the i-th query is exactly 
\ Si \' Xi . If a set of queries Q can be satisfied, the 

Pi 

initial load of the sink node V ^' Xi is at most 
one. Therefore, we propose a necessary condition 
for schedulability as follows. 

Theorem 6: If a set of data collection queries Q 
under an interference model M. is schedulable, then 

\Si\ ■ x 



< 1 



P 



(4) 



5.2 Efficient Algorithms for Scheduling 
Queries 

In this section, we design effective algorithms for 
scheduling data collection queries under various 



interference models. Data collection differs from ag- 
gregation in the sense that no in-network processing 
is allowed for data collection. Thus each node needs 
to transmit its raw data (if it has) and all received 
data towards the sink for data collection queries. 
The difference lies only in the transmission plan, 
thus we still apply the general framework proposed 
in Section 14.2.11 and only modify the phase for 
transmission plan construction. 

Phase I: Construct a data processing routing tree 
in the network for each query (see the method in 
Section I4T21) . 

Phase II: Construct a transmission plan for each 
node. 

For the j-th query, if a node u is a source node 
for this query, i.e., u G Sj, during each period p i? 
node u needs add a packet of its raw data for this 
query to its transmission plan. In addition, if u is a 
non-leafing node in the data processing routing tree 
Tj, during each period, u needs to route all packets 
it received from the subtree of Tj rooted at u to 
its parent in Tj. Thus, node u needs to add every 
packet it received to its transmission plan. Clearly, 
node u needs to add \D T {u)) D Sj\ packets during 
each period for the i-th query to its transmission 
plan. 

Phase III: Assign transmission time to each node 
to ensure interference-freeness (see the method in 
Section I4T23T) . 

Phase IV: Select data packets to transmit when 
it is a node's transmission time (see the method in 
Section \4~2A\i . 

5.3 Sufficient Conditions for Schedulability 

In Section 15.21 we propose a family of algorithms 
to scheduling data collection queries. We next prove 
that our algorithms are feasible. Since we also use 
Algorithm |2] to assign transmission time to nodes, by 
Lemma [3l our algorithms can ensure interference- 
free under various interference models. We are only 
left to prove that our algorithms can answer all 
queries in time. 

Lemma 8: The proposed algorithms in Sec- 
tion 15.21 can answer all data collection queries if 



0.69 



c 2 (M)-c A (M) 



(5) 
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Here c 2 (M.) is given in Lemma [51 and c 4 (A / l) > 1 
is given in Lemma [7J 

Lemma 9: The proposed algorithms in Sec- 
tion [5_21 can answer all queries within the deadlines, 
if di > c 2 (M) ■ T ■ 2R for the i-th query where R 
is the radius of communication graph G. 

Lemma [8] and [9] imply schedulability of the given 
set of queries. Thus, we propose a sufficient condi- 
tion for schedulability. 

Theorem 7: Equation © is a sufficient condition 
for schedulability of a set of data collection queries. 
We can also illustrate by an example (similar to 



start sampling the 3 rd type data at 13 : 55 : 00 
on 03/05/2010 with time period 5 minutes and send 
sample reading to the sink node under the collection 
operation. 

The main flow of our evaluation system is as 
follows: The sink node will generate a set of data 
aggregation (or collection) queries and broadcast it 
to the network one by one. The broadcast procedure 
will not stop until all nodes in the dNodes\\ receive 
the i-th query correctly. Secondly, the sink node ini- 
tiates to construct routing trees (based on the CDS) 
which cover all nodes (may need other nodes not in 



Figure [2]) that the sufficient condition in Theorem [7] dNodes\\ to relay) in dNodes\\ will be constructed 



is tight. 

6 Simulation results 

We randomly deploy a set of nodes {v\, ■ ■ ■ ,v n } 
with transmission range 50 in an area of size 400 x 
400 (note that we always keep connectivity of the 
networks). For any pair of nodes Vi and Vj, there is 
a feasible link if \v{Uj\ < 50. In addition, each link 
(i>i,i>2) is associate with a quality variable q VlV2 . 
Here, the value of q VlV2 Q is proportion to \viVj\. 

The sink node will generate up to 20 queries. A 
typical query message is {query ID , queryType iy 
dataTypei, startingTimei, timePeriodi, dNodes\\) 
where query jd is the unique query ID, queryTypei 
is the indicator distinguishing data collection and 
data aggregation, dataTypei is one of the data 
types, startingTimei is the required starting 
time at which a node will start its duty circle, 
timePeriodi is the duty circle period and dNodes\\ 
contains IDs of all nodes who should respond to 
the i-th query. Table \T\ shows a typical query in our 
simulation. 



TABLE 1 
A typical query 



queryiD_ 
1 



startingTimei 
13~:~55 "00 "03/0572010 



_ queryType L _ 
l(data collection) 



timePeriodi 
5(mins) 



for data aggregation/collection. When starting time 
startingTimei arrives, each node in nNodes[] will 
read the corresponding data (di) repeatedly with 
time Ti and transmit via routing trees. The sink node 
will continue to analyze all received data packets 
for each period of each query. When all currently 
existing queries are satisfied, the sink node will 
release next query up to 20 queries totally. When, 
none of existing queries is satisfied, the algorithm 
will terminate. 

We now evaluate the performance of our algo- 
rithms in different scenarios. In the first scenario, 
we vary the network size from 50 to 250 with step 
25. For each query, we pick source nodes randomly 
or always choose a set of source nodes with half 
of the network size. Figure [3|a) shows the results 
when we either randomly pick the number of source 
nodes for each query or always randomly pick half 
of the nodes as source nodes. The success ratio is 
equal to the number of successful rounds divided 
the total rounds for existing queries. 

When the network size increases over 150, the 
success ratio will quickly drop from around 0.8 to 
0.35. This is mainly caused by capacity bounds of 
CDS. The new packets from newly increased nodes 
(hence newly increased source nodes) lead CDS 
saturated such that many packets are dropped due 
dataTypep tha buffer limit. 



In the second scenario, we fix the network size 



dNodes\ 
"29,"42 ~ • 



l^n^-iicrease the number of source nodes in each 
query from 10 to 100 with step 10. The figure [3] (b) 



The query with ID 1 
every node with ID G 



shown in Table Q] requires 
{5 •••29,42 •••72,81} to 



1. In Tossim 1251 . dm denotes the best link quality and —115 
dm is the critical value for a valid link. 



shows the success ratio when the network size is 
100 and 200 respectively. As we can see, when the 
number of source node is small (less than 50), most 
of queries are satisfied. When the number of source 
node is larger than 50, the performance dropped 
quickly. In addition, there is no big difference 
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Fig. 3. The performance for data collection 
algorithm, a is the ratio of the number of 
chosen source nodes over the total number 
of nodes. 
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Fig. 4. The performance for data aggregation 
algorithm, a is the ratio of the number of 
chosen source nodes over the total number 
of nodes. 



when network sizes (100 and 200 respectively) are 
different. 

For data aggregation, we test our algorithmfor 
data collection under the same network configura- 
tion. As we can see in Fig. [4] (a) and (b), the success 
ratio dropped smoothly when we either increase 
the network size or increase the number of source 
nodes while keeping the network size fixed. This 
illustrates that our algorithms work very well for 
data aggregation procedure, which is based on CDS. 

7 Related work 

7.1 Scheduling for Queries 

Scheduling has been well studied in the literature for 
both single node and multi-node scenarios. For an 



extensive survey on optimization and approximation 
results in deterministic sequencing and scheduling, 
readers may refer to |[T0l . In lfT8l was presented the 
first sufficient condition on whether a set of queries 
(or requests) can be answered by a rate monotonic 
method in a single processor. This was further 
extended in IfTTl , ||29ll . Packet generalized processor 
sharing (P-GPS) fj27]|, flU provided flow control 
method in a more realistic packet transmission en- 
vironment. Some statistical analysis of P-GPS was 
provided in ||40l . A P-GPS server uses the packet 
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departing time as priority for scheduling packet 
transmissions. A number of packet scheduling meth- 
ods (e.g., P-GPS (23, and its variants such as 
weight fair queue (WFQ), Worst-case Fair WFQ [4]) 
have the desirable traffic fairness protection and can 
provide bounded delays. These scheduling methods 
could serve as a foundation for designing efficient 
and effective scheduling methods for multiple data 
collection/aggregation flows in multi-hop WSNs. In 
Ifl9l , the authors studied fair scheduling in wireless 
packet networks. However, we cannot directly apply 
the above methods because they neither considered 
the wireless interference constraints, nor the mul- 
ticast or data aggregation flows. Additionally, they 
have potentially high computational complexity. 

Recently, the authors in et al. studied the real 
time query scheduling in WSNs by assuming a pre- 
given routing tree. Their results often suffer from 
the fact that different routing structures will have 
vast impact on the delay performances and flow 
data rates supported by a WSN. Given a routing 
structure, the ultimate goal at a node is to schedule 
the packet transmissions such that the end-to-end 
delay requirements are satisfied for all packets of 
all flows, whenever possible. The Earliest Deadline 
First (EDF) can provide optimal delay bounds at 
a single node. To provide the end-to-end delay of 
packets, several extensions of EDF were proposed in 
the literature, for example, EDF with traffic shaper 
11301 - 11321 that can regulate the distorted traffic from 
the EDF scheduler to deal with the traffic burstiness. 
Unfortunately, using optimal traffic shaper is in gen- 
eral infeasible and will introduce additional packet 
delays. Another approach, such as deadline-curve 
based EDF (DC-EDF) (421, or similar one 0, is 
to judiciously adjust the local deadlines of packets 
at a node. Based on the traffic load and end-to- 
end deadlines, the DC-EDF scheme can guarantee 
end-to-end delay bounds and provide a schedulable 
region as large as that of RC-EDF Il42l . 

7.2 Data Aggregation/Collection: 

Data aggregation in WSN has been well studied 
recently 0, [HI, tf20l-|[22, (35|. One sporadic 
query scheduling for data aggregation/collection 
with minimum delay has been proven to be NP-hard 
10 and well studied in |fl2ll, fol, |[39ll |T38l 

A collision-free scheduling method for data col- 
lection is proposed in |fl6l , aiming at optimizing 



energy consumption and reliability. 

8 Conclusions 

We proposed joint design of a family of routing 
and packet scheduling schemes under different inter- 
ference models. Most importantly, we theoretically 
proved that our algorithm can achieve constant 
approximation in terms of schedulability. We also 
studied the overloaded case where not all queries 
can be scheduled by proposing an efficient method 
for carefully selecting a subset of queries that max- 
imizes the overall weight of the scheduled queries. 
In this case also, we theoretically proved that our 
proposed scheme can achieve constant approxima- 
tion. 
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